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Abstract 

This paper deals with a model of cellular growth called 
"Epigenetic Tracking" , whose key features are: i) distinc- 
tion bewteen "normal" and "driver" cells; ii) presence in 
driver cells of an epigenetic memory that holds the posi- 
tion of the cell in the driver cell lineage tree and represents 
the source of differentiation during development. In the 
first part of the paper the model is proved able to gener- 
ate arbitrary target shapes of unmatched size and variety 
by means of evo-devo techniques, thus being validated as 
a model of embryogenesis and cellular differentiation. In 
the second part of the paper it is shown how the model 
can produce artificial counterparts for some key aspects of 
multicellular biology such as junk DNA, ageing and car- 
cinogenesis. If individually each of these topics has been 
the subject of intense investigation and modelling effort, 
to our knowledge no single model or theory seeking to 
cover all of them under a unified framework has been put 
forward as yet: this work contains such a theory, which 
makes Epigenetic Tracking a potential basis for a project 
of Artificial Biology. 



1 Introduction 

This paper is concerned with a model of cellular growth 
called "epigenetic tracking" (described in (Fontana 



2008)), that belongs to the field or Artificial Embryology 
or Computational Development. The model is tested by 
its ability to generate arbitrary target shapes by means 
of evo-devo techniques, task which is taken as a mea- 
sure of the model goodness and at which it appears to 
be quite successful. Subsequently, the implications of the 
model are explored, in relation to some key aspects of 
cell biology: embryogenesis, junk DNA, ageing and car- 
cinogenesis: the model is shown able to produce artificial 
counterparts of each of these aspects (albeit with a re- 
duced level of complexity). The paper is divided into 
two parts. The first part reviews the previous work in 
the field of artificial embryology (section 2.1), describes 



the model of cellular growth (section 2.2) and reports the 
results of the experiments performed (section 2.3). The 
second part shows how the model is able to generate arti- 
ficial counterparts for each of the following key aspects of 
cell biology: embryogenesis (section 3), junk DNA (sec- 
tion 4), ageing (section 5) and carcinogenesis (section 6). 
Each of these sections opens with a description of the 
experimental evidence relevant to the aspect of biology 
considered, reviews the main existing models and theories 
and finally describes the artificial counterpart produced 
by epigenetic tracking. Scattered among more aspects, 
the topic of stem cells is also discussed. Finally, section 7 
draws conclusions and outlines future research directions. 

2 The Model of Cellular Growth 

2.1 Artificial Embryology related work 

The previous work in the field of Artificial Embryology 



(see (Kumar and Bentley, 2003 Stanley and Miikku- 



[lainen 2003 ) for a comprehensive review) can be divided 
into two broad categories: the grammatical approach and 
the cell chemistry approach. The grammatical approach. 



originated by Lindenmayer ( Lindenmayer , 1968), evolves 



sets of rules in the form of grammatical rewrite systems; 
the grammar can be context-free or context-sensitive and 
can utilise parameters; variations on this theme include 
using instruction trees or directed graphs in place of ac- 
tual grammars. L-systems were employed as a means of 
describing the complex fractal patterns observed in na- 
ture and particularly the architecture of plants. The cell 
chemistry approach draws inspiration from the early work 
of Turing (Turing, 1952|), who introduced a mathemati- 
cal model of diffusion and reaction within a physical sub- 
strate. This approach attempts to mimic more closely 
how physical structures emerge in biology; cells are ar- 
ranged in a physical space where simulated proteins can 
be sent as signals from one cell to another, as in nature. 



Within the grammatical approach, Sims (Sims, 1994) 



used directed graphs to evolve the body morphologies 
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and neural networks of artificial creatures in a simulated 
3D physical world; in these graphs, a node represents a 
body part and an edge specifies how body parts are con- 
nected. Using a domain similar to Sims', Hornby and 



Pollack ( Hornby and Pollack , 2002 ) applied L-systems to 



the simultaneous evolution of the body morphologies and 
neural networks of artificial creatures in a simulated 3D 
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physical environment. Cangelosi, Nolfi and Parisi (Can- 
^elosi et al.' '1994) devised a model of neural development 
which includes cell division and cell migration in addition 
to axonal growth and branching; the development pro- 
cess shows successive phases of functional differentiation 
and specialisation. Gruau's Cellular Encoding (Gruau 



et al. , 1996 ) uses grammar trees to encode steps in the 
development of a neural network starting from a single 
ancestor cell; the grammar tree contains developmental 
instructions at each node. 

Within the cell chemistry approach. Random Boolean 
Networks (RBN's) were originally developed by Kauf- 



mann as a model of genetic regulatory networks (Kauff- 
1969"); in the context of the development of multi- 
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cellular organisms, the attractors of RBN's are inter- 
preted as the different "cell types" of the organism. De 
Garis (De Garis, 11999') developed a model for evolving 



shapes in 2D reproductive cellular automata; the model 
was successful in evolving convex shapes but non-convex 
shapes (e.g. the L-shape) presented a problem. Bon- 
gard and Pfeifer (Bongard and Pfeifer , 2001) proposed 
a minimal model of ontogenetic development to evolve 
both the morphology and neural control of agents that 
perform a block-pushing task in a physically-realistic, vir- 
tual environment. Inspired by the cell adhesion process, 
Hogeweg fHogeweg" "2003") developed a model to simu- 
late morphogenetic processes such as cell migration or en- 
gulfing, achieving to evolve complex artificial organisms. 



Miller and Banzhaf (Miller and Banzhaf 2003) developed 



artificial organisms (the french flag) based on a method 
called Cartesian Genetic Programming, which evolves a 
developmental program inside cells. 



2.2 The Model 

In our model the phenotype of the organism is repre- 
sented as a 2-dimensional array of square-shaped cells, 
being each cell associated to a position on a grid. The de- 
velopment starts with a single cell placed in the middle of 
the grid, and unfolds in n artificial age steps, counted by 
the variable "Global Organismal Age" (GOA) that runs 
from to n-1 (n is a parameter). The term "global" refers 
to the fact that the variable GOA is shared by all cells 
(and therefore it can be considered the global "clock" of 
the organism). To each cell four variables are associated: 

• a flag indicating whether the cell is "driver" or "nor- 
mal"; 



• the "genome" , organised as an array of "change op- 
erators", which is identical in all cells; 

• the "cell epigenetic type" (GET), organised as an 
array of n integers (n is the number of artificial age 
steps), which is not identical in all cells; the GET is 
present only in driver cells; 

• an integer representing the cell's colour. In the cur- 
rent implementation four colour values are foreseen 
(0,1,2,3), an extra value (-1) indicates cell absence. 
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Figure 1: Example of proliferation; the column corre- 
sponding to the GOA value is highlighted with a thick 
frame (please note that column numbering starts from 
0). 



Gells belong to two categories: "driver" cells (coloured 
in yellow in the figures) and "normal" cells (coloured in 
orange or blue). The basic difference between driver and 
normal cells is that the first can be instructed by the 
genome (by means of an operator whose left part matches 
the GET value of the cell) to proliferate or induce apop- 
tosis in the surrounding area. Figure 1 shows an exam- 
ple: a driver cell associated to a GET value labelled with 
"A" (called "mother cell") proliferates in an area around 
it (called "change area" , delimited by a dotted line in the 
figure). While proliferating, it mostly generates normal 
cells (which fill the change area) and other driver cells, 
which are much fewer in number and "dot" the change 
area. 

A key point is the assignment of the GET values on 
the newly created driver cells. To each new driver cell 
a new GET value is assigned, starting from the mother 
cell's GET value (the array [200] in the figure, labelled 
with "A") and adding 1 to the value of the i-th position 
of the array at each new assignment, where i is the current 
GOA value (1 in the figure, corresponding to the second 
column -column numbering starts from 0); with reference 
to the figure, the new driver cells are assigned the values 
[210], [220],... , labelled with "B","G", etc. In practise 
the variable GET holds the position of the driver cell in 
the driver cell lineage tree or simply driver cell tree 
(DGT, the set of all driver cells, having a tree structure): 
this ensures that the new GET values are all different 
from the mother's value and from each other. Whether 
one of these new GET values will become the centre of 
another proliferation event depends on the presence in 
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the genome of an operator whose left part matches such 
value. 

The genome as we said is organised as an array of 
change operators (see figure 2). Excluding mutations, the 
genome is not modified during development and is iden- 
tical in all cells. Each change operator has a left part and 
a right part. The left part consists of a variable called 
XOA, having the same structure of the variables GOA 
and a variable called XET, having the same structure of 
the variables GET: if the XOA value is equal to the GOA 
value and the XET value is equal to the GET value of a 
given driver cell, the operator (in this case we speak of a 
"timed" operator) is activated and the relevant code spec- 
ified in the right part is executed for that cell; if the XOA 
value is -1, then the GOA value is ignored and the activa- 
tion of the operator (in this case called "non-timed" oper- 
ator) depends just upon the GET-XET match. The XET 
value is preceded by two parameters, one (OP, "order of 
precedence") indicating which operator takes precedence 
in case of multiple matches (XET values are not guaran- 
teed to be unique) and one (ON) indicating whether the 
operator is "structurally" inactive or not. The right part 
of the operator has: 

• a field with the coordinates of the rectangle which de- 
limits the change area (row and column values of the 
north-west and south-east corners of the rectangle); 

• a field holding a "master switch" (MSO) that defines 
the shape of the change area ( "rectangular" -value = 
0, "diagonal left" -value = 1, "diagonal right" -value 
= 2); 

• a field holding a second "master switch" (MSI) 
that defines the type of "change event" that is go- 
ing to occur ("proliferation" -value=0, "apoptosis" 
-value=l); 

• a field with a parameter (DT) that specifies the thick- 
ness of the diagonal (valid only if MS0=1 or 2); 

• a field with a parameter (GO) that specifies the 
colour of the newly created cells (both normal and 
driver) . 

In case of proliferation the change area is filled with 
newly created cells: most of the cells generated are nor- 
mal cells, some are driver cells. The driver cells are much 
fewer in number (usually a "linear normal to driver ra- 
tio" of 5 has been used, corresponding to a 2-dimensional 
ratio of 5 • 5 = 25) and are deployed evenly on the change 
area (the precise algorithm to place the driver cells in not 
important as long as it ensures a uniform distribution). 
In case of apoptosis, all the cells contained in the change 
area "die", i.e. are deleted from the grid. The different 
types of change events (proliferation in different shapes 
-rectangular, diagonal left and right- and apoptosis) can 
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Figure 2: The genome. 
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Figure 3: Example of development in four steps, driven 
by three operators. All "snapshots" are taken at the end 
of the relevant step; the zygote is present on the grid at 
the end of step -1, before the first age step, which is step 
0. Normal newly generated cells are shown in orange, 
normal old cells in blue, driver cells in yellow. 



be regarded as "painting primitives", i.e. basic paint- 
ing actions that can be combined together to yield more 
complicated shapes. At the end of all change events the 
mother cell is always removed from the grid. 

A special procedure is required if the change area is not 
empty. In this case the cells present must be either moved 
to other locations in the grid or removed altogether (over- 
written). The solution chosen consists in first removing 
the cells present in the change area, carrying out the pro- 
liferation and finally redeploying the removed cells onto 
the grid. The order in which cells are removed and re- 
deployed corresponds to their distance from the mother 
cell's position (experiments have been carried out with 
different types of distance). 

Figure 3 shows an example of development in four age 
steps (GOA=-1,0,1,2,) steered by three change opera- 
tors, the first (a rectangular proliferation) triggered by 
the GET value labelled "A" in step 0, the second (a rect- 
angular apoptosis) triggered by the GET value labelled 
"D" in step 1, the third (a diagonal right proliferation) 
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Figure 4: The dolphin, the couple and the hand (dynami- 
cal view, target contour superimposed). In the lower part 
the development sequence of the dolphin. 
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Figure 5: The frog, the baby and the stomach (dynamical 
view, target contour superimposed). In the lower part the 
development sequence of the stomach. 



triggered by the GET value labelled "E" also in step 1. 
The GET value "A" was present at the end of step -1 
(starting point). The GET values "D" and "E" have been 
created in step 0. This example illustrates the "core" 
of the machine: a GET value produces a change event, 
which in turn produces other GET values, some of which 
produce other change events and so on, in an indefinitely 
sustainable way. 

Let us summarise the key features of our model: i) the 
distinction bewteen "normal" and "driver" cells; ii) the 
implementation of the proliferation/apoptosis events in 
such a way that many cells are created/deleted at once; 
iii) the presence in driver cells of an epigenetic memory, 
that holds the position of the cell in the driver cell tree 
and represents the source of differentiation; iv) the mech- 
anism of assignment of the GET values on the newly gen- 
erated driver cells during a proliferation event, which en- 
sures that each new driver cell is assigned a new, previ- 
ously unseen GET value. We argue that this set of fea- 
tures is unique to our model and allows to reach a high 
level of performance in terms of both size and variety of 
the evolved shapes, as it will be shown in the next section. 

2.3 Experiments 

The model described in the previous section has been 
tested on the problem of artificial morphogenesis and cell 
differentiation achieved by means of evolutionary tech- 
niques, i.e the task of generating predefined 2-dimensional 
shapes by evolving genomes that guide the development 
of the shape starting from a single cell. The experimental 
procedure consists in evolving a population of genomes, 
at each generation letting the development unfold for each 
genome (starting from a single cell with GET = [0,...,0] 
placed in the middle of the grid and running GOA from 
up to a maximum value), and then using the adher- 
ence of the shape at the end of the development to the 
target shape as fitness measure. The genetic population 
is composed of 600 individuals (represented as strings of 
quaternary digits), undergoing elitism selection for up to 
20000 generations. The parameters of the Genetic Al- 



gorithm (GA) are 50% single point crossover, mutation 
rate of 0.1% per digit. The fitness function formula is the 
same adopted by H. de Garis ( |De Gari^ 1999): 



F = (ins — outs)/des 



(1) 



where ins is the number of cells of the evolved shape 
falling inside (and matching the colour of) the target 
shape, outs is the number of cells of the evolved shape 
falling outside the target shape, des is total number of 
cells of the target shape. At each age step two views 
are possible for the developing shape: one that shows the 
"dynamics" of the change operators, and one that shows 
the colours of cells. In the "dynamical view" driver cells 
are coloured in yellow, normal cells are coloured in orange 
if they have been just (i.e. in the current step) created, 
in blue if they have been created in one of the previous 
steps; areas where cells have been deleted by an apoptosis 
event are coloured in grey. In the "colour view" (which 
of course makes sense only for colour targets) cells are 
shown with their actual colours. 

Simulations have been conducted with a number of dif- 
ferent target shapes; the targets have been chosen with 
the objective of testing the method on shapes as diverse as 
possible, to prove its effectiveness in generating any kind 
of shape. All targets are 100x100 multi-cellular arrays: 
the limited computational resources available prevented 
us from putting to a test larger shapes. Figures 4-7 show 
the results of simulations conducted with black-and-white 
and colour targets (in some simulations slightly different 
painting primitives have been used). As we can see, all 
target shapes have been approximated to a good degree; 
colour targets have proved more difficult to evolve, as one 
may have expected. To our knowledge, no other method 
is able, by means of evo-devo techniques, to generate tar- 
get shapes with this size and variety. The remainder of 
this paper will be dedicated to showing how the model, 
beside being successful at the task it was designed for, re- 
lates to some key aspects of biology, namely embryogene- 
sis, junk DNA, ageing and carcinogenesis. Embryogenesis 
will be examined first. 
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Figure 6: The head. In the upper part, on the left the Figure 7: The heart. In the upper part, on the left the 

target shape, in the middle the best evolved shape in target shape, in the middle the best evolved shape in 

dynamical view and on the right the best evolved shape dynamical view and on the right the best evolved shape 

in colour view. In the lower part some development steps, in colour view. In the lower part some development steps. 



3 Embryogenesis 

3.1 Experimental evidence on embryoge- 
nesis 

Embryogenesis is the process by which the embryo is 
formed and develops. It starts with the fertilisation of 
the egg, then called a zygote. The zygote undergoes 
rapid cell divisions with no significant growth, produc- 
ing a cluster of cells that is the same size as the original 
zygote, called morula. The next stage is the blastula, a 
spherical layer of cells surrounding a fiuid-filled or yolk- 
filled cavity; mammals form then a structure called blas- 
tocyst, characterised by an inner cell mass not present 
in the blastula. During next stage, called gastrulation, 
cells migrate to the interior of the blastula, forming three 
(in triploblastic animals) germ layers, referred to as ecto- 
derm, mesoderm and endoderm. At some point after the 
different germ layers are defined, organogenesis begins; 
the first stage in vertebrates is called neurulation, where 
the neural plate folds forming the neural tube, but from 
now on embryogenesis follows no common pattern among 
the different taxa of the animal kingdom. 

Embryogenesis is the result of three coordinated pro- 
cesses: morphogenesis, cell growth and cell differentia- 
tion. Morphogenesis is concerned with the particular as- 
pect of embryogenesis relevant to the shapes of tissues, 
organs and entire organisms and the positions of the var- 
ious specialised cell types. Cell growth and differentiation 
can take place in cell culture or inside of tumor cell masses 
without the normal morphogenesis that is seen in an in- 
tact organism. In the human embryo, the change from 
a cluster of nearly identical cells at the blastula stage to 
a post-gastrulation embryo with structured tissues and 
organs is controlled by the genetic "program" and can 



be modified by environmental factors. A key role in the 
process of embryogenesis is thought to be played by stem 
cells. 

Stem cells are cells found in most, if not all, multi- 
cellular organisms. The classical definition of a stem cell 
requires that it possess two properties: i) self-renewal 
(the ability to go through numerous cycles of cell division 
while maintaining the undifferentiated state) and ii) po- 
tency (the capacity to differentiate into specialised cell 
types). Two types stem cells exist: embryonic stem cells 
(found in the inner cell mass of the blastocyst) and adult 
stem cells (found in adult tissues). Embryonic stem 
(ES) cells are totipotent: this means they are able to 
differentiate into all derivatives of the three primary germ 
layers, including each of the more than 220 cell types in 
the adult body; when given no stimuli for differentiation, 
embryonic stem cells maintain totipotency through multi- 
ple cell divisions. Adult stem cells are undifferentiated 
cells found throughout the body after embryonic develop- 
ment that divide, to replenish dying cells and regenerate 
damaged tissues; pluripotency distinguishes adult stem 
cells cells from totipotent embryonic stem cells: they are 
only able to form a limited number of cell types. Exam- 
ples of adult stem cells are hematopoietic stem cells and 
colon stem cells. 

3.2 Artificial embryogenesis 

The model of cellular growth called "epigenetic tracking" 
has been tested experimentally with the problem of arti- 
ficial morphogenesis and cell differentiation, implemented 
respectively through the shaping and colouring of cellular 
sets: therefore its interpretation as a model of artificial 
morphogenesis and cell differentiation is straightforward. 
Given a model of cellular growth, a key question is how to 
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measure the model's goodness. It is our belief that a very 
important ingredient in assessing a model's goodness is 
represented by its evolvability: in this respect our model's 
ability to evolve complex target shapes can be considered 
experimentally demonstrated. The level of complexity of 
the shapes generated is of course still very far from the 
level of complexity displayed by nature, but nevertheless 
very high compared to other growth models. 

In the model of cellular growth proposed, driver cells 
take the role of embryonic stem cells. The artificial coun- 
terpart of the property of "potency" (characteristic of ES 
cells), is represented by the capacity of driver cells to give 
rise to cells of any colour. As far as the "self-renewal" 
property is concerned, the situation is a bit different, in 
that ES cells, when proliferating, give rise to other iden- 
tical ES cells, while in our model this is not strictly true: 
driver cells, when proliferating, give rise to other driver 
cells which have different GET values. The GET value 
of the mother and those of the daughters is of course 
similar (they share the upper sub-tree), but not identi- 
cal. Driver cells have also one property in common with 
Spemann's organisers: if a driver cell (a Spemann's or- 
ganiser) destined to give rise to a certain shape (embryo) 
part is moved to a different position of the growing shape 
(embryo), that shape (embryo) part will grow in the new, 
ectopic position. 

The artificial genome corresponds to the natural 
genome: it is defined as that part of genetic informa- 
tion which is shared by all cells. The cell epigenetic type 
(GET) corresponds to the cell epigenetic memory (stored 
in DNA's methylation patterns); it represents the part of 
genetic information that is different from cell to cell and, 
as such, it constitutes the primary source of the informa- 
tion necessary for cellular differentiation. The GET can 
be thought as the provider of the artificial counterpart of 
the "first transcription factor", that gives origin to the 
whole cascade of gene activations that build-up the tran- 
scriptome and determine the cellular identity; for driver 
cells such factor comes from within, while for non-driver 
cells it has to be supplied from the outside. 

Speaking about features of the model that lack biolog- 
ical plausibility, at present inter-cellular signals are not 
modelled; as a result, the local environment has no in- 
fiuence on cell fate determination, which is in contrast 
with the biological evidence. Moreover, in our model any 
change event is carried out by a single operator activated 
by a single GET value; in other words, we have a genetic 
regulatory network in which the outputs are connected di- 
rectly to the inputs, with no "hidden layer"; we know by 
contrast that in biological systems events are determined 
by the interplay of many genes. It is in our plans to add 
these features to the model and it is our opinion that 
such additions will be beneficial in terms of robustness to 
perturbations, with no negative impacts on evolvability. 

We end this section by noting that, even if the target 



shape would be coincident with the picture of a real liv- 
ing being (an animal, for instance), the sequence of mor- 
phogenetic steps our algorithm would follow to develop 
the shape would in general be different from the actual 
sequence followed by the animal embryo during its devel- 
opment. This brings us to conclude that our algorithm 
is not able to reproduce exactly biological embryogenesis; 
rather, it is able to produce a phenomenon that can be 
considered of analogous nature. Needless to say, biologi- 
cal reality is much more complex compared to our experi- 
ments, which are nonetheless significantly more complex, 
in terms of size and variety of the shapes generated, than 
most (if not all) related work. 

4 Junk DNA 

4.1 Experimental evidence and theories 
on junk DNA 

In molecular biology, "junk DNA", or "noncoding DNA", 
is a collective label for the portions of the DNA sequence 
of a genome for which no function has been identified. 
About 95% of the human genome has been designated 
as "junk", including most sequences within introns and 
most intergenic DNA. While much of this sequence may 
be an evolutionary artifact that serves no present-day 
purpose, some junk DNA may function in ways that are 
not currently understood. Moreover, the conservation of 
some junk DNA over many millions of years of evolution 
may imply an essential function: according to a compar- 
ative study of over 300 prokaryotic and over 30 eukary- 



otic genomes (Ahnert and et al., 2008), eukaryotes ap- 



pear to require a minimum amount of non-coding DNA 
(in humans the predicted minimum is about 5% of the 
total genome). Some chromosomal regions are composed 
of the now-defunct remains of ancient genes, known as 
pseudogenes, which were once functional copies of genes 
but have since lost their protein-coding ability. As much 
as 25% of the human genome is recognisably formed of 
retrotransposons (Deininger and Batzer, 2002): new 



research suggests that genome size variation in at least 
two kinds of plants is mostly because of retrotransposons 



(Piegu and et al. 2006). 



There are some hypotheses, none conclusively estab- 
lished, for how junk DNA arose and why it persists in the 
genome: 

• Junk DNA might provide a reservoir of sequences 
from which potentially advantageous new genes can 
emerge. In this way, it may be an important genetic 
basis for evolution; 

• Some junk DNA could be spacer material that al- 
lows enzyme complexes to form around functional el- 
ements more easily. In this way, it could serve a use- 
ful function regardless of the actual base sequence; 
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Figure 8: Example of development in four age steps 
steered by three change operators (target shape super- 
imposed in G0A=2). 



• Some portions of junk DNA could serve presently un- 
known regulatory functions, controlling the expres- 
sion of certain genes during the development of an 



organism from embryo to adult (Woolfe 2005). 



4.2 Artificial junk DNA 

In our model, at any given moment in the course of evo- 
lution, an individual's driver cell tree generated during 
development can be divided into i) driver cells (i.e. their 
GET values) that activate an operator during develop- 
ment and ii) driver cells that do not activate any opera- 
tor during development. In the same way the individual's 
genome is composed by i) operators (i.e. their XET val- 
ues) that become active during development and by ii) 
operators that do not become active during development. 
By analogy with real genomes, elements in the two cat- 
egories labelled with ii) can be defined as "junk" driver 
cells (GET values) and "junk" operators (XET values) re- 
spectively. A schematic representation of this distinction 
is given in figures 8 and 9: figure 8 shows an example of 
development in four steps and figure 9 shows the corre- 
sponding driver cell tree and genome, where grey circles 
represent junk elements. For ease of reference the set 
of driver cells active during development will be called 
DGT-D ("Driver Geh Tree D", D for "development"), 
while the set of driver cells not active during development 
will be called DGT-I (I for "inactive"). Analogously, the 
set of operators active during development will be called 
GEN-D ("Genome D") and the set of operators not ac- 
tive during development will be called GEN-I. In the rest 
of this section we will argue that the presence of junk in 
both the driver cell tree and the genome is an inescapable 
phenomenon, intimately linked to the functioning of the 
epigenetic tracking machine. 

The variable GET, as described in section 2, is organ- 



DRIVER CELL 
TREE 




! GENOME 




1 fOPERATORS) 1 


1 ca 


ca 


ca 1 


ca 


ca 


ca 


ca 


ca 


ca 


ca 


ca 


ca 


1 ca 


ca 


ca 1 


1 ca 


ca 


c* 1 


ca 


ca 


ca 


1 ca 


ca 


ca 1 


1 ca 


ca 


ca 1 


1 ca 


ca 


ca 1 


1 ca 


ca 


ca 1 



LIST OF GET 
VALUES CREATED 












1 








2 








3 








4 








5 








6 








7 








8 








4 


1 





4 


2 





4 


3 





4 


4 






Figure 9: The corresponding driver cell tree. Grey cir- 
cles represent GET values that match no XET values and 
XET values that match no GET values (junk elements); 
green circles represent GET values and XET values that 
match; brown squares represent right parts of operators. 



ised as an array of n integers, where n is the number of 
artificial age steps. A legitimate question is which consid- 
erations motivated this design choice and, in particular, 
why an array structure was preferred to a simpler scalar 
structure. In reality the first idea was to use a scalar 
GET value, along with a global counter, subject to the 
following rules: 

• The global counter GNT stores the last GET value 
assigned in a proliferation event occurred anywhere 
in the shape; it is incremented by one at each new 
assignment; 

• The zygote's GET value is zero. Each time a pro- 
liferation event takes place, the GET value assigned 
to the first newly created driver cell is the value held 
by the global counter; subsequent values are deter- 
mined adding one at each new assignment (the global 
counter is updated correspondingly). 

This "scalar approach" , an example of which is given in 
figure 10, has two major drawbacks. The first drawback is 
due to the need of a global variable (the global counter), 
that has to be accessed constantly by all organism's cells, 
in order to be kept updated: for this reason such approach 
is not biologically plausible. The second drawback is that 
it creates "dependencies" between parts of the develop- 
ing shape that would otherwise be unrelated. Figure 11 
shows an example of such a dependency. Panel A rep- 
resents the starting point of development, with G0A=-1 
and GNT=1. Let's now imagine that evolution invents 
the developmental path A-B-G ("path X"), in which the 
part of the shape circled in red remains undeveloped. The 
most straightforward way evolution has to develop such 
part is to cast a proliferation event on the driver cell with 
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Figure 10: Example of development in four steps carried Figure 11: The utilisation of a scalar GET limits the ef- 
out with a scalar GET and a global counter. fectiveness of the OA in exploring the search space. 



the GET value 4 (circled in black), thus exploring a new 
developmental path ("path Y"). 

In this new path, such proliferation event takes place in 
G0A=1 (panel D). As a result, the drivers with GET val- 
ues 6-9 are either not existing any more or are placed in 
different positions (compared to path X) . As a further re- 
sult, the operators acting on such values have now random 
effects and the area circled in panel D remains unfilled, 
bringing the fitness of path the Y individual down com- 
pared to the fitness of the path X individual. Faced with 
choosing between paths X and Y, the Genetic Algorithm 
will therefore choose path X, which has an overall higher 
fitness (the unfilled area in panel G is smaller than the 
unfilled area in panel D): the ultimate effect is that path 
Y will not be selected for and the circled region in panel 
G will remain unfilled. The Genetic Algorithm has still 
the possibility to use a timed operator (i.e. an operator 
whose activation is conditioned to GOA taking a certain 
value -in this case greater than 2), but such operators are 
harder to evolve and the effectiveness of the evolutionary 
process is in any case reduced. 

The introduction of an array version for the GET over- 
comes these two problems at once. Besides being biolog- 
ically plausible, it decouples the GET values generated 
in different proliferation events, thus eliminating the risk 
of unwanted dependencies. Unfortunately a new prob- 
lem, related to the size of the Genetic Algorithm's search 
space, arises. With an array of n numbers, each with, 
say, 10 possible values, the total number of array values 
is 10"^: this is the size of the space the GA has to search. 
As the value of n grows beyond values not too small, this 
number quickly becomes very large and the GA search 
space becomes unmanageable, virtually bringing evolu- 
tion to a halt. The solution to this problem consists in a 
procedure called "Germline Penetration" (see figure 12). 
Such procedure acts on the genome of each individual at 
the end of development, copying at random the GET val- 
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Figure 12: Germline Penetration transfers values from 
the driver cell tree to the genome (copied values are high- 
lighted with a thicker border). 



ues occurred during development onto the XET values of 
change operators in the genome, as a suggestion for the 
GA for where to search: with this procedure in place, the 
effectiveness of the GA is restored. To avoid disrupting 
development, the "transplanted" operators are set as in- 
active, so that they start their "career" as junk operators. 

With the previous considerations in mind we can now 
turn our attention to junk DNA. The epigenetic track- 
ing machine, the way it is conceived, cannot do its job 
without generating a lot of junk GET values. In the de- 
velopment of the dolphin (figure 4), for instance, 471 GET 
values are generated and only 27 are used; in the develop- 
ment of the frog (figure 5), 555 GET values are generated 
and only 25 are used; on average, in the experiments per- 
formed, the average ratio (GET values used / GET values 
generated) is around 5%. The presence of such a high 
percentage of unused GET values seems to be a waste 
of resources: we could ask if there is a way to reduce 
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it, improving the perceived efficiency of the technique. 
The most straightforward way to reduce the amount of 
unused GET values consists in decreasing the ratio be- 
tween driver and normal cells in proliferation events: for 
instance, instead of the value of 1:25 used in our experi- 
ments, we could use a value of 1:10. This would cause the 
evolved shapes to have, at any age, a sparser distribution 
of driver cells (a lower density of "yellow dots") and, as 
a result, the "paintbrush" would become less precise and 
evolution of development would become harder. 

So, there seems to be a trade-off between the preci- 
sion of painting and the density of driver cells (and hence 
the percentage of junk GET values): the second aspect 
cannot be improved without worsening the first. Since 
the effectiveness in evolving shapes is the primary objec- 
tive of the method, in this regard we are not prepared to 
make concessions: therefore, a certain amount of unused 
GET values must be reckoned with. The conclusion of 
the discussion so far can be expressed by saying that the 
following two facts have emerged as inescapable, inherent 
characteristics of our method: 

• the presence of a high percentage of junk GET val- 
ues, to allow for a sufficient precision of the painting 
process; 

• the need to use the procedure called Germline Pen- 
etration, to allow the GA to work with an array- 
structured GET variable, thus eliminating the draw- 
backs deriving from the use of a scalar GET variable. 

Putting these two elements together, it would not be 
surprising to observe that Germline Penetration acts like 
a shuttle, transferring junk GET values from the driver 
cell tree onto a corresponding number of junk XET values 
in the genome (with the hope that they meet each other 
and will not be junk anymore!): actually, this is exactly 
what we saw happening in our experiments. Since in such 
experiments the total number of change operators is kept 
fixed, the percentage of junk XET values in the genome 
does not, in general, match the percentage of junk GET 
values in the driver cell tree. Going back to the frog ex- 
ample, since the number of active operators is 25 and 
the total number of operators in the genome is fixed at, 
say, 100, percentage of junk XET values is (100-25)/100 
= 75%, significantly lower than the corresponding per- 
centage of junk GET values (95%): should we allow the 
genome to have a flexible size, we could expect these two 
percentages to be roughly equal. 

The conclusion of this section, therefore, is that the 
presence of junk information in both the driver cell tree 
and the genome is a fact that is inescapably connected 
to the core of the epigenetic tracking machine, a require- 
ment essential to its evolvability. Finally, we like to con- 
clude with two observations of a more speculative nature. 
Firstly, should we hypothesise the existence of a mecha- 
nism akin to Germline Penetration also in biological sys- 



tems, we would be naturally led to think of mobile DNA 
elements, or transposons, as the actual device used to 
carry the GET values from the biological equivalent of 
driver cells, spread throughout the body, to the germline 
cells, where they would deliver the recipe of current de- 
velopment as a suggestion for future improvements. Sec- 
ondly, we note that, by allowing the developmental his- 
tory of an organism to influence its genome and therefore 
to be passed on to the next generation, the mechanism 
of Germline Penetration adds a Lamarckian touch to the 
Darwinian evolution implemented by the Genetic Algo- 
rithm. 

5 Ageing 

5.1 Theories on ageing 

Ageing is the accumulation of changes in an organism over 
time, leading to a steady decline in bodily functions. It is 
not a universal phenomenon: in a few simple species, age- 
ing is negligible and cannot be detected (such species are 
not immortal, however, as their members will eventually 
fall prey to trauma or disease). The only measure that 
was proved to be effective in slowing the ageing process 
in a wide variety of species is caloric restriction. Theories 
that explain ageing have generally been divided between 
stochastic and programmed theories of ageing. Stochastic 
theories blame environmental impacts that induce cumu- 
lative damage on living organisms at various levels as the 
cause of ageing. Programmed theories imply that ageing 
is regulated by biological clocks operating throughout the 
life span; this regulation would depend on changes in gene 
expression that affect the systems responsible for mainte- 
nance, repair and defense responses. 

Within the first category, the wear-and-tear theory 
maintains that changes associated with ageing are the re- 
sult of chance damage that accumulates over time (not 
unlike the "ageing" of a mechanical device). Accord- 
ing to the somatic mutation theory, ageing results 
from damage to the genetic integrity of the body's cells. 
The accumulative-waste theory points to a buildup 
of cells of waste products that presumably interferes with 
metabolism. The free-radical theory is based on the 
idea that free radicals create damage that gives rise to 
symptoms we recognise as ageing. 

Within the second category, the ageing-clock the- 
ory argues that ageing results from a preprogrammed 
sequence, as in a clock, built into the operation of the 
nervous or endocrine system of the body. In rapidly di- 
viding cells the shortening of the telomeres (structures at 
the ends of chromosomes that have experimentally been 
shown to shorten with each successive cell division) would 
provide just such a clock. The reproductive-cell cy- 
cle theory is built around the idea that ageing is reg- 
ulated by reproductive hormones that act in an antag- 
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Figure 13: Example of artificial ageing. The global clock 
is allowed to take the values 3-6 after the moment of re- 
production. In these steps some operators are activated 
and affect the phenotype, leading to a worse fitness value 
in step 6. 



Figure 14: Example of artificial ageing. The CET/XET 
values that are the determinants of the fitness deteriora- 
tion are shown in purple in both the driver cell tree and 
the genome. The GET array positions corresponding to 
the ageing GOA steps are shown in grey. 



onistic pleiotropic manner via cell cycle signaling: they 
promote growth and development early in life in order 
to achieve reproduction, but later in life become dysreg- 
ulated and drive senescence. Evolutionary theories 
share the main assumption that ageing has evolved be- 
cause of the increasingly smaller probability of an organ- 
ism still being alive at older age, due to predation and 
accidents. It is thought that strategies which result in 
a higher reproductive rate at a young age, but shorter 
overall lifespan, result in a higher lifetime reproductive 
success and are therefore favoured by natural selection. 



The disposable soma theory (Kirkwood, 1977) and the 



mutation accumulation theory belong to this group. 
5.2 Artificial Ageing 

As we have seen in previous sections, for a given individ- 
ual development unfolds in n artificial age steps (n=8 in 
most experiments performed) and, at the end of it, the 
individual's fitness value is evaluated. The moment at 
which the fitness value is evaluated coincides with the mo- 
ment at which the genome content is frozen and is handed 
over to the genetic operators of reproduction, cross-over 
and mutation, in order to be passed on to the next genera- 
tion. Such moment, that has its biological counterpart in 
the moment of reproduction, has always coincided in our 
experiments with the end of the simulation; on the other 
hand, we can imagine to let the global clock GOA go on 
and see what happens, in the period after the moment of 
reproduction. The distinction between the periods before 
and after reproduction can be thought to correspond to 
the biological periods of development (say, until 25 years 
of age in humans) and ageing (from 25 years of age to the 
moment of death). By analogy, we will call the period 



(i.e. the set of age steps) before fitness evaluation "artifi- 
cial development" and the period after fitness evaluation 
"artificial ageing". 

As pointed out in section 4, at the end of development 
there are many junk driver cells, as well as many junk op- 
erators. This stock of junk represents a reservoir of events 
that can potentially be triggered after the moment of fit- 
ness evaluation, in the period that we called artificial age- 
ing. Since these events occur after fitness evaluation, their 
effects are by definition not affecting the fitness value; for 
this reason they will have a random nature: they can be 
thought of as a random noise superimposed on the pheno- 
type produced by the work of the operators subject to the 
evolutionary pressure. Given their random nature, the ef- 
fects of such events on the overall fitness of the phenotype 
are (much) more likely to be detrimental than beneficial. 
The set of driver cells not active during development, in- 
dicated with DGT-I, can thereforebe further subdivided 
into a set of driver cells that become active during the 
ageing period and a set of driver cells that are never ac- 
tive (the true junk): we will refer to these two subsets 
as to DGT-A ("Ageing") and DGT-J ("Junk"). Analo- 
gously, also the set of operators can be split into a set 
of operators that become active during the ageing period 
and a set of operators that are never active, GEN- A and 
GEN-J respectively. 

An example of artificial ageing is reported in figure 13 
(figure 14 shows the corresponding driver cell tree). The 
shape evolved in figure 8 is now allowed to develop further 
(the global clock takes the values 3-6), after the moment 
of fitness evaluation (which takes place at the end of step 
2). In these steps some operators become active and af- 
fect the phenotype, leading to a worse fitness value in 
step 6 (the operators that are the cause of this worsen- 



10 













iiiJ 


m 




m 




















1^ 










Figure 15: The "face", period of development (steps 0- Figure 16: The "face", period of ageing (steps 10-19). 
9). The shape grows from a single cell to the mature The picture quality deteriores steadily under the action 
phenotype in step 9, when fitness evaluation takes place of random operators. 



ing are shown in purple in the driver cell tree of figure 
14). Figure 15 and 16 show a demonstration of this phe- 
nomenon for a "face" shape (picture of 100x100 size with 
16 grey shades); figure 15 refers to the period of devel- 
opment (steps from to 9): the shape grows from the 
single cell stage to the mature phenotype in step 9, when 
fitness is evaluated; figure 16 shows steps from 10 to 19, 
which belong to the period of ageing. As we can see, such 
period is characterised by the accumulation of random 
events (both of type proliferation and apoptosis), whose 
global effect consists in a progressive deterioration of the 
quality of the image. 

In conclusion, the phenomenon of artificial ageing can 
be thought as being determined by the cumulative action 
of change events whose effects manifest themselves after 
the moment of fitness evaluation, a scenario consistent 
with the explanation provided by the "mutation accumu- 
lation theory" for biological ageing. At the end of this 
section, we wish to dedicate a final remark to the role 
played by the junk in the phenomenon of artificial age- 
ing. In section 4 the sets of driver cells/operators incative 
during development, indicated with DCT-I and GEN-I 
respectively, were shown to be a useful reservoir of new 
change operators and an indispensable tool to explore new 
evolutionary paths. In this section we have shown how 
a part of it (DCT-A and GEN-A) is actually devoted to 
cause random events that manifest themselves after the 
moment of reprodution, relegating to DCT-J and GEN- 
J the role of true junk. On the other hand, we point 



out how the border between DCT-A/DCT-J and between 
GEN- A/GEN- J is highly permeable and the average size 
of DCT-A and GEN-A is proportional to the size of DCT- 
J and GEN- J (see figure 17). These considerations bring 
us to deduct, in the epigenetic tracking "world", a direct 
link between the evolvability of a species and its suscep- 
tibility to ageing, being both features mediated by the 
presence of a big stock of junk. The fact that bats have 
unusually small genomes and display a remarkably long 
lifespan (i.e. they appear to age less) among mammals 



of comparable dimension (Van den Bussche et al. 1995) 



could hint to the existence of a similar link also in real 
biological systems. 

6 Carcinogenesis 

6.1 Experimental evidence and theories 
on carcinogenesis 

So far, we described how the model works under "nor- 
mal", or physiological circumstances. Now we will en- 
ter into the realm of artificial pathology; in other words 
we will analyse possibile malfunctions of the model and 
we will show how these malfunctions give origin to phe- 
nomena that can be considered the artificial equivalent of 
carcinogenesis, the process by which tumours are formed. 
Despite having being the subject of intensive investiga- 
tion, carcinogenesis has failed to disclose all its secrets: 
as for today, no model has succeeded in the task of ex- 
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Figure 17: The organisation of the DCT and the genome. 



plaining ah experimental evidence and many questions 
remain unanswered. In the next sections we will show 
how, under certain conditions, the model can produce 
"malfunctions" analogous to carcinogenesis; the presence 
of junk material in both the the driver cell tree and in 
the genome will once again be shown to play a key role 
in inducing such malfunctions. 

Cancer is a class of diseases in which a group of cells dis- 
play uncontrolled growth, invasion and sometimes metas- 
tasis; these three malignant properties of cancers differ- 
entiate them from benign tumors, which are self-limited, 
do not invade or metastasise. From a cell biology per- 
spective, cancer cells are conjectured to have, compared 
to normal cells, the following "superpowers" (Hanahan" 



and Weinberg 2000): i) they grow even in the absence 
of normal "GO" signals; ii) they grow despite "STOP" 
commands issued by neighbouring cells; iii) they evade 
built-in autodestruct mechanisms; iv) they are able to 
stimulate blood vessel construction; v) they are effectively 
immortal; vi) they have the power to invade other tis- 
sues and spread to other organs. Overall the incidence 
of cancer rises with age, increasing rapidly during the 
fourth decade of life (in humans) and continuing to in- 
crease thereafter, but more slowly in the fifth, sixth and 
seventh decades. Epidemiological studies have proved be- 
yond doubt that both genetic and environmental factors 
are implicated in carcinogenesis. 

The mainstream theory of carcinogenesis states 
that carcinogenesis is a multi-step process that can take 
place in any cell, driven by damage (mutations) to genes 
(onco-genes and tumour-suppressor genes) that normally 
regulate cell proliferation, which in turn upsets the nor- 
mal balance between cell proliferation and cell death and 
results in uncontrolled cell division and tumour forma- 
tion. A more recent theory differentiates from the stan- 
dard theory in tracing back the origin, the maintenance 
and the spread of a tumour to a relatively small subpop- 
ulation of cells called cancer stem cells, whereas the 



bulk of the tumour would actually be composed of non- 
tumorigenic cells that, deprived of the cancer stem cells, 
would quickly shrink and disappear. 

A few cancer-related genes, such as p53, do seem to be 
mutated in the majority of tumors. But many other can- 
cer genes are changed in only a small fraction of cancer 
types, a minority of patients, or a handful of cells within 
a tumour. Moreover, some of the most commonly altered 
cancer genes have oddly inconsistent effects. B. Vogel- 
stein found that the much studied oncogenes c-fosand c- 
erbbS are curiously less active in tumors than they are 
in nearby normal tissues. The tumour suppressor gene 
RB was recently shown to be hyperactive -not disabled- 
in some colon cancers, and, perversely, it appears to pro- 
tect those tumors from their autodestruct mechanisms. 
In conclusion, the attempt to trace back tumour forma- 
tion to a subset of mutated genes, consistently found in 
all tumours, has so far been unsuccessful. 

In (Gibbs, 2003[ ) three non-standard theories are re- 



viewed. In order to account for the number of mutations 
(which are very rare events) required to turn a cell cancer- 



ous, the "modified dogma theory" (Loeb et al. 2003) 
hypothesises that something (a carcinogen, reactive ox- 
idants, or perhaps a malfunction in the cell's DNA du- 
plication and repair machinery) accelerates dramatically 
the mutation rate; this theory thus adds a prologue to the 
accepted life history of cancer, but the most important 
factors are still genetic mutations. In the "early insta- 



bility theory" (Jallepalli and Lengauer, 2001), chromo 



somal instability occurs early on and represents the first 
step in carcinogenesis; in this hypothesis, there are several 
master genes critical for mitosis: if as few as one of these 
genes is disabled, the cell stumbles each time it divides, 
muddling some of the chromosomes into an aneuploid 
state; genetic mutations then lead to a benign tumour, 
converted later, through additional mutations, to cancer. 



According to the ( "all-aneuploidy theory", (Duesberg 
2005)), cancer cells are aneuploid because they start that 
way. Many factors can interfere with a dividing cell so 
that one of its daughter cells is cheated of its normal 
complement of chromosomes and the other daughter is 
endowed with a bonus; unlike the other theories, the all- 
aneuploidy hypothesis predicts that the carcinogenesis is 
more closely connected to the assortment of chromosomes 
than to genetic mutations. 



Mathematical models of cancer -see (Wodarz and Ko- 



marova 



|2006J for a comprehensive review- have found ap- 



plication in three major areas: i) Modelling in the context 
of epidemiology and other statistical data; ii) Mechanistic 
modelling of avascular and vascular tumour growth (in- 
cluding physical properties of biological tissues) ; iii) Mod- 
elling of cancer initiation of progression as somatic evolu- 
tion. Basic mathematical tools used are ordinary differ- 
ential equations, partial differential equations, stochatic 
processes, cellular automata and agent-based models. To 
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our knowledge, most mathematical models stick to the 
standard theory, are based on differential equations and 
have the primary objective of explaining the dynamics of 
tumour growth; in such models cells have always been 
treated as "black boxes". Fewer models have tried to 
open the black box and provide an explanation of car- 
cinogenesis based on the interplay of cell components. 

6.2 Artificial Carcinogenesis 

In the remainder of this section we will stick to the as- 
sumption, shared by most cancer theories, that every tu- 
mour has its origin in a single cell, that we will call Orig- 
inating Cell (OC). The possible occurrences are broken 
down into four scenarios, characterised by the following 
driving events: 

1. Activation of a timed operator in the ageing period 
(no mutations); 

2. Mutation(s) to the ceh GET value or to the XET 
value of an operator in the cell genome, with activa- 
tion of a junk CET-XET couple; 

3. Mutation(s) to the ceU GET value or to the XET 
value of an operator in the cell genome, with activa- 
tion of a non-junk GET- XET couple; 

4. Damage to the mechanism of generation of new GET 
values. 

We will now consider in detail each of these scenar- 
ios, analysing the properties of the produced phenomena 
along three main axes: 1) the rate of incidence in relation 
to the age; 2) the degree of "randomness"; 3) the degree 
of self-sustainability of the growth process. 

6.3 Scenario 1: activation of a timed op- 
erator in the ageing period (no mu- 
tations) 

As pointed out in section 5, active driver cells/operators 
can be subdivided in driver cells/operators active during 
development and driver cells /operators active during age- 
ing: the first scenario we consider is characterised by the 
activation of a (GET,XET) couple belonging to (DGT- 
A,GEN-A), by means of a timed operator. As described 
in section 2 and recalled in section 4, a timed operator 
is an operator that is not activated as soon as a match- 
ing driver cell appears, but waits until the variable GO A 
takes a certain predefined value, stored in the operator's 
variable XOA. In the scenario under consideration, the 
XOA value is such that GOA takes such value well af- 
ter development is completed, during what we called the 
artificial ageing period. If the operator codes for a prolif- 
eration event, the cell will proliferate but, unless the GET 
values generated during such proliferation do not activate 



other change operators (a very unlikely occurrence), the 
proliferation will very quickly come to a halt. If the op- 
erator codes for an apoptotis event, it will likewise soon 
be halted (right after its execution). 

This scenario is thus characterised by: i) a tendency 
to occur in the ageing period; ii) a random nature and 
iii) a non- "self-sustaining" character and hence a limited 
duration. The effects on the phenotype can be likened 
to the superposition of random noise on the concerted 
"dance" of the operators that build the phenotype until 
the moment of repoduction (and are as such subject to 
the evolutionary pressure). Given their random nature, 
the effects of such events on the overall organismal fitness 
are more likely to be detrimental than beneficial; there- 
fore, we are naturally led to think of the phenomena just 
described as factors contributing to the process of artifi- 
cial ageing dealt with in section 5. 

A possible biological counterpart of this scenario is rep- 
resented by a limited, non self-sustaining proliferation of 
cells, like those giving rise to fibromas (benign tumors 
composed of fibrous or connective tissue: they can grow 
in all organs, arising from mesenchyme tissue) or to cysts 
(closed sacs having a distinct membrane and division on 
the nearby tissue: they may contain air, fiuids, or semi- 
solid material). These manifestations, even though of pro- 
liferative nature, are mostly benign (in that they do not 
cause a sharp fall in the organismal fitness -only a mild 
one) and are associated to the ageing phenotype. As may 
be noted, this scenario does not involve mutations (nei- 
ther epigenetic nor genetic): malfunctions which are in- 
deed caused by mutations will be examined next. 

6.4 Scenario 2: mutation(s) to the cell 
GET value (epigenetic) or to the 
XET value of an operator in the cell 
genome (genetic), resulting in activa- 
tion of a junk (CET-XET) couple 

We start by considering the case -scenario 2A- in which 
the GET of the originating cell suffers an (epigenetic) 
mutation that turns it into a GET value that activates a 
change operator in the genetic memory (by matching its 
XET value). Alternatively -scenario 2B-, it can be the 
XET of an operator to suffer a (genetic) mutation that 
turns it into a value that matches the cell's GET value 
and, as a result, the operator is activated. In a third 
case -scenario 2G, combination of 2 A and 2B- both the 
GET and the XET are hit by mutations. The common 
feature of these cases is that a (GET, XET) couple that 
used to be junk (i.e. used to belong to (DGT-J,GEN- 
J)), now becomes active. Even though the causes of this 
scenario (involving epigenetic and/or genetic mutations) 
and the causes of scenario 1 (no mutations involved) are 
different, both scenarios have a similar outcome: if the 
(formerly) junk operator codes for a proliferation event. 
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Figure 18: "Physiological" proliferation in maintenance Figure 19: Tumorigenic proliferation in maintenance 
driver cells. driver cells. 



the cell proliferates but, unless the GET values generated 
during the proliferation do not activate other change op- 
erators (which is very unlikely), the proliferation comes 
very quickly to a halt (same in case of apoptosis). 

Since this scenario is caused by a mutation, in the- 
ory it can occur at any age value, either during devel- 
opment or during ageing (the (CET-XET) couple could 
move from (DCT-J,GEN-J) either to (DCT-D,GEN-D) 
or to (DCT-A,GEN-A)); on the other hand, since muta- 
tions are supposed to be rare events, we can expect it 
to be more frequent as the age progresses (the tendency 
of the rate of indidence to increase with the age simply 
reflecting the time necessary for a relatively rare event to 
occur); scenario 1, on the other hand, occurs by defini- 
tion after the moment of reproduction: for this reason the 
age-specific incidence rate patterns of scenario 1 and sce- 
nario 2, although similar, may not be exactly coincident. 
This scenario is thus characterised by: i) a tendency to 
occur more frequently as the age progresses; ii) a "ran- 
dom noise" nature and iii) a non-self-sustaining character 
and hence a limited duration. The biological counterpart 
is as in scenario 1. 

6.5 Scenario 3: mutation(s) to the cell 
GET value (epigenetic) or to the 
XET value of an operator in the cell 
genome (genetic), resulting in activa- 
tion of a non-junk CET-XET couple 

The prologue of scenario 3 is identical to that of scenario 
2: a mutation hits either the cell GET variable (scenario 
3A) or the XET variable in one of the genome's operators 
(scenario 3B) or both (scenario 3G), causing the opera- 
tor to become active and carry out a change event. The 
key difference is that, while in scenario 2 the activated 
(GET-XET) couple used to be part of the junk, in sce- 
nario 3 it did not. Instead, the GET-XET match and 
the execution of the relevant event occurred also during 
(normal) development, where it gave rise to a part of the 
artificial embryo that contributed to the overall construc- 
tion of the phenotype: this can be also be expressed by 
saying that the (GET-XET) couple after the mutation be- 
longs to (DGT-D,GEN-D). Given identical instructions, 
the cell has no choice but to react identically: therefore it 



will execute again such part of development, generating 
other GET values, that in turn will activate other change 
operators and so forth, until the system comes to a halt, 
as it did during normal development. 

We wish to point out the main characteristic of this sce- 
nario. While in the previous scenarios the proliferation 
comes very quickly to a halt, because the GET values 
generated are very unlikely to match any change opera- 
tor in the genome (they are all new, previously unseen 
values), in the present scenario the first GET value that 
triggers the change event is indeed a GET value used dur- 
ing development, and so will be (some of) the GET values 
generated during the proliferation it triggers. In fact this 
cascade of activations was evolved to construct the pheno- 
type: therefore it will not run out of steam very soon, but 
will go on for some time, as it did during development. 
The problem is that now such cascade of activations takes 
place in the wrong place and in the wrong moment: there- 
fore its overall effect is likely to be akin to a massive per- 
turbation on the evolved phenotype, very different from 
the random noise typical of the previous scenarios. This 
scenario is thus characterised by: i) a tendency to occur 
more frequently as the age progresses (for the rare event 
effect); ii) a nature non-random noise-like (akin instead 
to a massive perturbation); iii) a self-sustaining nature, 
but with a limited duration (as development does not go 
on forever). 

A possible biological counterpart of this scenario is ter- 
atoma, a tumour with tissue or organ components resem- 
bling normal derivatives of all three germ layers (rarely, 
not all three germ layers are identifiable). The tissues of 
a teratoma, although normal in themselves, may be quite 
different from surrounding tissues, and may be highly 
inappropriate, even grotesque: teratomas have been re- 
ported to contain hair, teeth, bone and very rarely more 
complex organs such as eyeball, torso, and hand. Usually, 
however, a teratoma does not contain organs but rather 
one or more tissues normally found in organs such as the 
brain, thyroid, liver, and lung. A possible path leading to 
an (artificial) teratoma is the following: the GET value 
of an (artificial) adult liver cell is turned into the GET 
value of a driver cell which in normal development is a 
precursor of an (artificial) hand; as a result, the mutated 
driver cell will try to generate the hand. Since such cell 
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finds itself is in the wrong cellular surrounding, it will only 
manage to minic the normal development of the hand in 
a grotesque fashion. We finally note that, while in our 
model the occurrence of the artificial teratoma tends to 
increase as the age progresses, real teratomas tend to oc- 
cur more frequently during fetal development. 

6.6 Scenario 4: damage to the mecha- 
nism of generation of new GET values 
during prohferation 

As recalled in section 3, adult stem cells are undifferen- 
tiated cells found throughout the body after embryonic 
development that divide to replenish dying cells and re- 
generate damaged tissues. In the same section we showed 
how driver cells can be considered the artificial counter- 
parts of embryonic stem cells; in order to insert also the 
analogous of adult stem cells in the epigenetic tracking 
framework we must bring a small modification to the 
event of proliferation as described in section 2. The modi- 
fication consists in adding also the GET value "A" among 
the GET values generated during the proliferation event, 
where "A" is the GET value of the mother cell, that trig- 
gered the proliferation event in the first place (see figure 
18). In other words, the ceh with GET value "A", while 
proliferating, produces a copy of itself, that in turn will 
give rise to another (identical) proliferation event. 

In physiological conditions we can think that the rate at 
which "A" driver cells are generated is exactly counterbal- 
anced by the rate at which "A" driver cells are consumed 
or destroyed, in such a way that the net balance is zero 
(system in equilibrium) . If we call "percentage of replen- 
ishment" (indicated with P) the percentage of "A" driver 
cells on the total of daughter cells, we can refer to PO as 
the value that keeps the system in equilibrium. This func- 
tioning corresponds to the behaviour of adult stem cells, 
that are generated to replenish the stem pool whenever it 
becomes depleted (because too many stem cells have gone 
down a differentiation path to perform their specialised 
jobs in the organism). To distinguish the newly intro- 
duced kind of proliferation from the standard one, we will 
call the events/driver cells/operators active during devel- 
opment "development events/driver cells/operators" , 
while those acting after development (i.e. during the age- 
ing period, equivalent to adult stem cells) will be called 
"maintenance events/driver cells/operators". 

Now, the stage for a more dangerous scenario (scenario 
4A) is set when a fault arises in the mechanism that cells 
use to generate new GET values during a maintenance 
event of proliferation type. Within this scenario many 
variants are conceivable (such mechanism can be dam- 
aged in many ways): in one possible variant the damage 
can be such that the percentage P is increased beyond the 
value PO that maintains the system in equilibrium. Fig- 
ure 19 shows an example: instead of having only one cell 



bearing the same GET value of the mother, now there 
are three. In this situation the originating cell and its 
epigenetically identical progeny are stuck to execute the 
same operator, leading to an increased proliferation rate 
that overshoots the destruction rate; the ultimate effect 
is what will be perceived as uncontrolled proliferation: 
the distinguishing mark of tumours. Scenario 4 A is thus 
characterised by: i) a tendency to occur later in life; ii) 
a nature non-random noise- like; iii) a self-sustaining na- 
ture, with unlimited duration. 

In the scenario just described the originating cell is a 
maintenance driver cell, in which even under normal cir- 
cumstances a fixed percentage of the progeny shares the 
same GET value of the mother. But the same thing could 
also take place in a development driver cell, in which un- 
der normal circumstances no driver cells in the progeny 
share the GET value of the mother. In this scenario (4B) 
cellular proliferation behaviour goes from the paradigm of 
figure 1 directly to the paradigm of figure 19. The damage 
to the GET value mechanism generation can occur either 
in driver cell active during development or in a driver 
cell active during ageing; in this latter case, since the op- 
erator involved is always timed, a peculiar phenomenon 
takes place: even though the damage occurs early in life 
(in theory it can occur at any artificial age), in order to 
manifest itself the damage must wait until GOA takes 
the value stored in the operator's XOA. This means that, 
between the moment of the damage and the moment at 
which its effects become manifest, there can be a long la- 
tency period. Scenario 4B is thus characterised by: i) a 
tendency to occur later in life; ii) a nature non-random 
noise-like; iii) a self-sustaining growth pattern, with un- 
limited duration. 

The biological counterpart for both scenario 4A and 
4B is represented by benign tumours, expanding with 
varying degrees of speed, which roughly corresponds to 
"superpowers" 1 and 2 of subsection 6.1: further genetic 
mutations are hypothesised to be necessary in order to ac- 
quire the additional superpowers that are the hallmark of 
cancer. In this view carcinogenesis is driven by a combi- 
nation of structural damage (the biological equivalent of 
the GET value generation mechanism) and genetic muta- 
tions; this appears to fit quite well with the experimental 
evidence, that seems to require also a structural damage 
in the cell, in additions to genetic mutations, as a require- 
ment for carcinogenesis (requirement acknowledged in the 
"early instability theory" and in the "all-aneuploidy the- 
ory"). The long latency between the supposed moment 
at which the damage occurs (e.g. exposure to tobacco 
smoke) and the moment at which cancer breaks out (e.g. 
lung cancer) is a well known phenomenon, whose arti- 
ficial counterpart, as we have seen, our model is able 
to reproduce. In conclusion, our model can be seen as 
a combination of the "cancer stem cell theory" and the 
"all-aneuploidy theory" . 
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7 Conclusions 

We presented a model of cellular growth called "epige- 
netic tracking" and demonstrated its effectiveness in gen- 
erating arbitrary shapes by means of evo-devo techniques. 
The model was subsequently applied to the study of key 
biological issues such as embryogenesis, the organisation 
of genome, ageing and carcinogenesis, showing that is can 
produce artificial counterparts for each of them. We can 
conclude that the model proposed has the potential to 
be the foundation for a project to build a whole artificial 
biology, displaying many aspects in common with real bi- 
ology. Future work will be focused on some key issues, 
that at present are not biologically plausible, namely i) 
modelling of inter-cellular signalling and ii) modelling the 
genetic regulatory network that is known to function in 
real cells. 
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